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Abstract 

The positive orthant method tries to find solutions to consistent systems of inequali¬ 
ties, and approximate solutions to inconsistent systems, by maximizing a fit measure 
based on the sign function and the absolute value function. We concentrate on systems 
of linear inequalities and develop a convergent majorization algorithm. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpdx.net/pubfolders/pom has a pdf 
version, the bib hie, the complete Rmd hie with the code chunks, and the R source code. 


1 Introduction 


In metric scaling methods we have data, which we take to be a vector £ G M n , and we 
have a model, which we take to be a function F : M. p =4> M n . We want to hnd x G 
such that ( rs F(x). In non-metric scaling methods this is more specifically formulated as 
rank(£) ~ rank(F(a:)). For our purposes it is convenient to define the rank rank(x) of a 
vector x G by 

A 1 n 

rank i(x) = - ^ sign (x t -Xj), 

/ 3 = i 


where 


sign (x) 


f+1 if x > 0, 
< —1 if x < 0, 
[o if x = 0. 


Thus rank vectors are centered, and tied elements get an average rank number. 


x<-c(l ,2,3, 4,4,5) 

print (rowSums(sign(outer(x,x, "-")))/2) 


## [1] -2.5 -1.5 -0.5 1.0 1.0 2.5 

Many of the computational complications in non-metric scaling arise from the fact that 
rank : K n =>■ M n is not smooth, in fact not even continuous. 

There are other complications, however. Often the date do not consist of a single rank order, 
but of multiple rank orders. This is generally not difficult to incorporate into the various 
approaches that have been used so far. A more serious complication is that sometimes data 
are not given as rank orders, but as paired comparisons or triadic comparisons. Paired 
comparison data are not necessarily consistent with any given rank order, because they allow 
subjects to give asymmetric and intransitive judgments. In this paper we discuss non-metric 
scaling methods that deal with paired comparison data. Since rank orders, and more generally 
“rank k out of n” or “pick k out of n” data, can always be converted to paired comparisons, 
this generalizes treatment of rank ordered data. See, however, (???) for some thoughtful 
discussion how to handle triadic and similar data. 

Suppose we have the following building blocks. 

1 . fi with i — 1 , • • • , n are real valued functions on MR 

2. S = is a square matrix of order n with elements +1, —1, or 0. 

These components define a system of inequalities of the form 


Vijififa) - fj(x )) > 0. 


( 1 ) 
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We can assume, without loss of generality, that E is hollow (i.e. has zero diagonal elements). 
But we do not assume that E is anti-symmetric, i.e. that a l3 = —<Tji. Of course if a tJ = —Oji 
then aij(fi(x ) — fj(x )) > 0 and aj t (f)(x) — fi(x)) > 0 are the same inequality, and one of 
them is redundant. More generally, we allow for redundant inequalities in (1), and even for 
replications of the same inequality. 

We also allow for the possibility that a tJ = a Jt = 1 or a l3 = = —1 for a pair (i,j), in which 

case (1) requires that fi{x) = fj(x). In this way we can incorporate equality restrictions into 
our system of inequalities. 

Of course if a VJ = 0 the corresponding inequality is always trivially satisfied, whatever x is. 
By allowing for zero a t] we can proceed as if (1) is defined for all even if we only have 

observed a subset of the possible binary comparisons in our data. Or if we decide to eliminate 
redundant inequalities. In that sense zero elements of E correspond with “missing data”. 

Note that we only mention in passing the various coding decisions that have to be made. 
If rank(£) is 1, 2, 3, 4, 4, 5, for example, we can decide to code all comparisons in the 
matrix E, using the sign function. Then E is 


## 


[, 1 ] 

[, 2] 

[, 3] 

[ >4] 

[, 5] 

C, 6] 

## 

[ 1 J 

0 

-1 

-1 

-1 

-1 

-1 

## 

[2,] 

1 

0 

-1 

-1 

-1 

-1 

## 

[3,] 

1 

1 

0 

-1 

-1 

-1 

## 

[4,] 

1 

1 

1 

0 

0 

-1 

## 

[5,] 

1 

1 

1 

0 

0 

-1 

## 

[6,] 

1 

1 

1 

1 

1 

0 


Note that this implies we use what Kruskal (1964) calls the “primary approach” to ties, 
where equality means we leave the order within tie-blocks unspecified. We can also use the 
“secondary approach”, where we require equality within tie-blocks. Then E becomes 


## 


[, 1 ] 

[, 2] 

[, 3] 

[ >4] 

[, 5] 

[, 6] 

## 

[ 1 J 

0 

-1 

-1 

-1 

-1 

-1 

## 

[2,] 

1 

0 

-1 

-1 

-1 

-1 

## 

[3,] 

1 

1 

0 

-1 

-1 

-1 

## 

[4,] 

1 

1 

1 

0 

1 

-1 

## 

[5,] 

1 

1 

1 

1 

0 

-1 

## 

[6,] 

1 

1 

1 

1 

1 

0 


We could also decide to remove redundant inequalities and wind up with 

## [, 1 ] [, 2 ] [, 3 ] [ , 4 ] [, 5 ] [, 6 ] 

## [1,] 0-10000 

## [2 ,] 00-1000 

## [ 3 ,] 0 0 0 -1 0 0 

## [ 4 ,] 000010 

## [ 5 ,] 00010-1 

## [6,] 000000 
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Using the sign function on pairs of model values has a long tradition in statistics and 
psychometrics. We review some of the history, with a slight emphasis on our own work, in a 
later section of this report. 


2 Fit Function 

We measure how well the inequalities (1) are satisfied by defining the fit function 


A Ot(x) 


( 2 ) 


with 


a(z) = Y, Y, ~ fj{ x ))> 

i =1 3 = 1 
n n 

P( X )=Y.Y. W V I ( X ) - fj ( X ) ) I' 

*=1 3=1 


( 3 ) 

( 4 ) 


Because WijOij = the denominator (3 can also be written more simply as 

n n 

P{ X ) = Y.J2 W i3\(fi( X ) -fj( X ))l 
2=1 3=1 

Note that /3 is the weighted Gini Mean Difference of the fi(x) (Yitzhaki (2003), Yitzhaki and 
Schechtman (2013)). 

Maximization of (j) over x to find a solution of the system (1) if it is consistent, and an 
approximate solution if it is inconsistent, is the Positive Orthant Method. The name is chosen 
because we want the vector <Tij(fi(x) — fj(x )) to be in the non-negative orthant. 

Clearly 

~Wij(M x ) - fj( x ))\ < Vij(M x ) - fj ( x )) < Wij(M x ) - fj( x ))l 

and thus — 1 < <fi(x) < +1 for all x for which the denominator is non-zero. Also 4>(x) = 1 
if and only if <7ij(fi(x) — fj(x)) > 0 for all (i,j) for which ^ 0, again assuming the 
denominator is non-zero. 

Again, the history of this loss function is reviewed towards the end of the report. 


3 Analysis 

3.1 General Case 

If we define 

n 

Pi — ^ ) ( Wj.j(Jjj Wji<Tji), 
3=1 
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then 


n 

a ( X ) =Y,PiM X )' 

i— 1 


( 5 ) 


To verify, suppose F(x) is 

## [ 1 ] 3 4 2 5 1 

Also choose W as 


## 


[, 1 ] 

[,2] 

[ ,3] 

[ >4] 

[ , 5] 

## 

[ 1 J 

0 

0 

1 

1 

1 

## 

[2,] 

0 

0 

1 

1 

1 

## 

[3,] 

0 

0 

0 

0 

0 

## 

[4,] 

0 

0 

0 

0 

0 

## 

[5,] 

0 

0 

0 

0 

0 

and £ as 






## 


[, 1 ] 

[ , 2] 

C , 3] 

[ >4] 

[ , 5] 

## 

[ 1 J 

0 

1 

1 

1 

1 

## 

[2,] 

-1 

0 

-1 

-1 

1 

## 

[3,] 

-1 

1 

0 

1 

-1 

## 

[4,] 

1 

1 

-1 

0 

1 

## 

[5,] 

1 

1 

1 

-1 

0 


Then p is 

print (r <- rowSums (w * s - t(w * s))) 

## [ 1 ] 3-1 0 0-2 

From (3) we compute 

sum (w * s * outer (f, f, "-")) 

## [ 1 ] 3 

while the right-hand side of $(5) is 

sum (r * f) 

## [ 1 ] 3 

For the denominator of 0 we find 


P( x ) = '52Pi( x )fi( x ), 

i— 1 


( 6 ) 
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where 


Pi{x) = + w ji) a ij ( x ) 


and aij(x) = sign (fi(x) - fj(x)). 
As an example, we have for T,(x) 


## 


[,1] 

[, 2] 

[,3] 

[,4] 

[, 5] 

## 

[1J 

0 

-1 

1 

-1 

1 

## 

[2,] 

1 

0 

1 

-1 

1 

## 

[3,] 

-1 

-1 

0 

-1 

1 

## 

[4,] 

1 

1 

1 

0 

1 

## 

[5,] 

-1 

-1 

-1 

-1 

0 


From the definition in (4) we compute 

sum (w * abs (outer (f, f, "-"))) 


## [ 1 ] 11 

and the right-hand side of (6) is 

sum (f * rowSums ((w + t(w)) * s)) 

## [ 1 ] 11 


3.2 Rearrangement 

The classical rearrangement theorem in Hardy, Littlewood, and Polya ( 1952 ), Section 10 . 2 , 
Theorem 368 says that if we have two sorted vectors x\ < X 2 ■ ■ ■ < x n and yi < y 2 < • • • < y n 
and if n is any permutation, i.e. any one-to-one mapping of X n ={ 1 , 2 , • • • , n} into I n , then 

n n 

^ ^ xah T y ) Xiyir(i) ■ 

i= 1 i =1 

The rearrangement theorem was used by De Leeuw (1973) to define a family of multidimen¬ 
sional scaling methods. There is a more recent discussion in De Leeuw (2017b). In this 
section we show that the positive orthant method can also be interpreted as a rearrangement 
method. Related material on alternative representations of the Gini Mean Difference are in 
Yitzhaki and Schechtman (2013), Chapter 2, which has the promising title “More Than a 
Dozen Alternative Ways of Spelling Gini”. 

For any sign matrix S 

Wij I fi(x) ~ fj(x )| > w i:j - fj(x )) 

for all pairs (i,j). Thus, adding up these inequalities, 


P( x ) > 1ZM x )Y.] ( Wij + Wji)Sij, 

i =1 j= 1 
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with equality if and only if s t j = sign (fi{x) — fj(x)) for all pairs for which Wij(fi(x) — fj(x)) ^ 0. 
This implies 

n n 


/ 3 (» 


max 

s&s 


i =1 j =1 


T 'UJji ) $ij 


This can also be written as 


n 

P{x) = ma xJ2 r tfi(x), 

i =1 


(7) 


where 1Z is the set of vectors of the form 

n 

r% = + w ji) s ij> 

3 =1 

with S = {sjj} varying over the sign matrices (i.e. essentially over all permutations). 

We check our result by using the nextPermutationO function from De Leeuw (2016), and 
apply it to our small example. 

i <- 1:5 
m <- -Inf 
repeat { 

t <- sign(outer (i, i, 

m <- max (m, sum (f * rowSums ((w + t(w)) * t))) 
i <- nextPermutation (i) 
if (is.null (i)) break 

> 

print (m) 

## [ 1 ] 11 

which is the same as our previous result. 


(p{x) 


DU Pifiix) 

max reR Er=i nfiix) 


( 8 ) 


The numerator of 0 is a linear function of /. The denominator of <j) is the maximum of a 
finite number of functions linear in /, which means it is convex in /. This does not mean, of 
course, that it is convex in x, except when the /* itself are linear. 

Also note that if the /* are homogeneous, in the sense that /( Xx) = A r f(x) for some r > 0 
and for all x and A > 0, then (p(Xx) = <f>{x) and maximizing 0 can be done by maximizing a 
over all x for which /3 < 1. Note that (3(x) < 1 if and only if Yh=i fi ( x ) Ej=i ( w ij +Wji)sij < 1 
for all S G S. 
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If Wij = 1 for all i j and S' is a sign matrix with s l3 = sign {x/i — y 3 ), then 

n 

n = 2 Sij = 4 rankj ((/) 

3 = 1 

In that case we have, from (7), by the rearrangement theorem 


/ S{x ) = 4^ rank t (F(x))fi(x) 
1=1 

Thus, if E is the sign matrix of the data (, 

= Er=i ra nk,;(C )fj{x) 

' ' Er=i ran k i(F(x))fi(xy 

where F(x) is the vector with the f t (x). 


( 9 ) 


( 10 ) 


3.3 Multiple Data Sources 

If there are multiple data sources, for example multiple individuals in a paired comparisons 
design, then we simply combine the fit measures by adding them. This works well, because 
the denominators are all the same. We just add all numerators, and the result is still linear 
in /. 


\ _ f EEl Pijfi( x ) \ 

\ max reR E?=1 rifi(x) J 

But now consider the method of triads, for example. We could take a single triad as a 
data source, with only three non-zero weights, and sum loss functions over triads. Since the 
weights for each triad are different, the denominators are different and we cannot just add 
numerators. 



m 


H x ) = E 

3 =1 


Er ; l>ijfi(x) \ 

max reR . E?=1 r ifi(x) j 


( 12 ) 


Individual difference models are an additional level of complication. The model F is different 
for each j. because there are parameters for individuals or replications, and the combined 
loss is consequently more involved. 


m 


4>{x) = Z 

3=1 


EEi pijfi{x,yj) \ 
max reRj E”=i r ifi( x , yj) J 


( 13 ) 



4 Linear Nonmetric Problems 


This section improves on the rather shaky early papers De Leeuw (1968b) and De Leeuw 
(1969). In the linear case we write ffx) = f-x. Note that this implies that — f}x) > 0 

is automatically satisfied if fi = fj. Also note that because we take differences we cannot 
estimate an intercept. It also means that a{x) = x'F'p. We can require without loss of 
generality that a(x) > 0. 

4.1 Linear Programming 

The positive orthant method is equivalent to the constrained optimization problem 


max (52pifi(x) | ^2 r ifi(x) < 1 ,Vr G ft}. (14) 

1=1 1=1 

If the fi are linear in x, as they are in non-metric regression and discriminant analysis, then 
(14) is a linear programming problem, with a large, but finite, number of linear constraints. 
There are n\ sign matrices, and thus equally many linear constraints. It should be possible 
to cleverly add constraints one at a time, and update the solution. This gives an iterative 
procedure which converges to the global maximum of 0 after a finite number of steps. 
Developing this algorithm will have to wait for another publication. 


4.2 Majorization 


Alternatively, we can apply majorization starting from an initial estimate, and improving the 
fit iteratively. Optimizing functions that depend on the absolute value function can have run 
into difficulties, because of lack of differentiability at the orign. It makes sense to replace the 
absolute values by an approximation which can be made arbitrary close, and which remains 
smooth. The most common choice is \x\ « \/ x 2 + e, which has been shown to have some 
optimal properties (Ramirez et al. (2014)). More precise smooth approximations have been 
studied (Voronin, Ozkaya, and Yoshida (2015), Bagul (2017)) but they are computationally 
more complicated to deal with. 


Thus, instead of maximizing 0, we maximize 0 £ , which is 

^ A «(*) 


0e(x)’ 


where 

n n 

0e(x) = Y,Y, W V\/ (CA - fj)'x) 2 + e 

i= 1 3=1 

Note that 0 e (x) > 0(x), and thus 0(x) < f e (x). Remember we always choose x such that 
a(x) > 0. 


By the AM-GM inequality we have 


fj)'x) 2 + e < 


1 1 
^ y/((fi - fj)'y) 2 + e 


{(Ui-fjyxf+u 


fj)'y ) 2 + 2e} • 
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for all x and y. Consequently 


A 1 , 


1 „ 


Pe(x) < 7 e(x, y ) = -x'B e (y)x + -y'B e (y)y + e EE 


w 


131 


(15) 


where 


B.(») = £E , = , 2 Ui - m, - fiY 

i-lj-l V(Ui “ fi) y) + e 


(16) 


Note that B e (y) is positive semi-definite for all y. Dehne rtA fc+ b by maximizing 8 e {x,x ^), 
where 

«,(*,») ^ “ w 


7e(®,y)’ 


Now 


0 e (a; ( ' fc+1) ) > <5 e (x^ fe+1 \a:®) > <5 e (x^,x®) = (j) e (x^), 

which means we have a proper majorization algorithm. We maximize S e (x,x^) by setting 
the derivative equal to zero. This gives 

x (k+i) = A B t (x {k) )- l F’r 

for some appropriate A. Note that 5 e (x,x is not invariant under multiplication of a; by a 
constant, unlike (f>(x), so we have to hnd the optimal A. We must maximize 

s fa 0m) xW) = __ 

’ \X 1 r'FB,(x^)- 1 F'r+ \xM'B, (#))#) + «££ 


w. 
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from which 


A = 


xW'B e (xW)xW + 2e E E Wi. 

\ r 1 F5 £ (aA fe )) -1 F'r 


4.3 Initial Estimate 

We can hnd an approximate solution, or an initial point for an iterative maximization process, 
by maximizing 


1 p(x) = 


y'E’LiE’U «%((/, -/,)'U 2 


Dehne 

n n 

r=££<%(/. -MVi-fiY- 

i =1 3 = 1 

Then the optimal re is just rr = V~ X F'p. 
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4.4 Example: Neumann 

Our first example are the data from Neumann, used previously to illustrate nonmetric linear 
regression by Gifi (1990), De Leeuw and Mair (2009), and De Leeuw (2017a). 

data(neumann, package="Gif i") 

Our first analysis uses the complete sign matrix, with primary approach to ties. 

g <- neumann$density 

f <- cbind(neumann$temperature,neumann$pressure) 
ss <- sign(outer (g, g, 


The results of the linpomO function, for different values of e, are given below. In each 
case we iterate until the increase of the fit function (phi_e in the table below) is less than 
1(T{-10}. 


## 

eps: 

1.0e-01 

itel: 

14 phi_e: 

0.881518 phi: 

0.992111 

x: 

-0.023688 

0.002955 

## 

eps: 

1.0e-02 

itel: 

21 phi_e: 

0.969609 phi: 

0.992127 

x: 

-0.020392 

0.002536 

## 

eps: 

1.0e-03 

itel: 

29 phi_e: 

0.989094 phi: 

0.992156 

x: 

-0.020120 

0.002487 

## 

eps: 

1.0e-04 

itel: 

25 phi_e: 

0.991780 phi: 

0.992168 

x: 

-0.020103 

0.002473 

## 

eps: 

1.0e-05 

itel: 

20 phi_e: 

0.992119 phi: 

0.992168 

x: 

-0.020104 

0.002471 

## 

eps: 

1.0e-06 

itel: 

17 phi_e: 

0.992162 phi: 

0.992169 

x: 

-0.020108 

0.002472 


For this example, in which there are only a few ties, we do not expect the secondary approach 
to be that different from the primary one. 

ss[ss==0] <- 1 
diag(ss) <- 0 

And indeed, the results are virtually the same, although the fit function is slightly different. 

## eps: 1.0e-06 itel: 17 phi_e: 0.990859 phi: 0.990866 x: -0.020101 0.002472 

We can also eliminate redundant inequalities and only have ay, = +1 when Q is the largest 
of all data elements strictly less than Q, i.e. the next element in the rank order. If there are 
ties we have a tJ = +1 for more than one element, and withn tie blocks we have a tJ = 0. 

n <- length (g) 
st<-matrix(0, n, n) 
for (i in l:n) { 

ind <- which (g < g[i]) 
if (length (ind) == 0) next 
mnd <- max (g[ind]) 
jnd <- which(g[ind] == mnd) 
st[i, ind [jnd]] <- 1 

> 
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The linpomO analysis gives results on a quite different scale. 

dim (st) 

## [1] 65 65 
dim(f ) 

## [1] 65 2 

hnr <- hns 

#hnr <- linpom (f, st, aeps = le-6, xold, = NULL, eps = le-10, Umax = 100, verbose = F 

## eps: 1.0e-06 itel: 17 phi_e: 0.990859 phi: 0.990866 x: -0.020101 0.002472 

Although the coefficients and fit function values are very different from the ones we had 
before, the plots of “observed” vs “predicted” are very similar. 

Redundant 


o 
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Reduced 


o 



For the “redundant” soution the correlation between observed and predicted is 0.9573195841, 
and Kendall’s tan is 0.957319564. For the “reduced” solution this is 0.9348822832 and 
0.9348822832. 

5 Binary Regression 

If the data are binary (success/failure, yes/no, alive/death, trump/clinton) then obviously 
treatment of ties becomes rather essential. It seems the primary approach is more natural. 

It tries to find a separating hyperplane, and thus it is similar to logistic regression, support 
vector machines, and discriminant analysis. 

As an example we use the breast cancer data from Newman et al. (1998), included in the R 
package mlbench (Leisch and Dimitriadou (2010)) 

data(BreastCancer) 

bccom <- BreastCancer [complete.cases (BreastCancer), ] 
g<-as .numeric (bccom$Class) 
g <- 2 * g - 3 

h<-as.matrix(bccom[,c(-l,-ll)] ) 
f <- array (0, dim(h)) 

for (j in 1:9) f[,j] <- as.numeric(h[, j]) 
s <- sign(outer (g, g, "-")) 

hbin <- linpom (f, s, aeps = le-6, eps = le-10, itmax = 1000, verbose = FALSE) 


13 



## eps: 1.0e-06 itel: 418 phi_e: 0.998821 phi: 0.998821 x: 0.041302 -0.002514 0.041981 0 

The secondary approach is conceptually not very appealing. We require the predicted values 
for all negative instances to be equal, and similar for the predicted values of the positive 
instances. 

s [s==0] <-l 
diag(s) <-0 

hus <- linpom (f, s, aeps = le-6, eps = le-10, itmax = 1000, verbose = FALSE) 

## eps: 1.0e-06 itel: 82 phi_e: 0.839712 phi: 0.839754 x: 0.004107 0.044646 0.014937 0.0 

Both the primary and secondary approach have the disadvantage that the matrix E = {ay,} 
is quite large. To make computations more efficient, and more like the usual computations in 
logistic regression and the like, we define our positive orthant inequalities as 

Oif[x > 0 , 

where cy = +1 if i is is positive response and cy = —1 if the instance is negative. We also 
add a column with ones to F to estimate a cutoff value. The fit function becomes 

I%=i w i\fi x \ ’ 

where we assume without any real loss of generality that both all wy are positive and all a % 
are non-zero. We can now apply exactly the same results as before to find a majorization 
algorithm, which we have implemented in the R function binpom(). 

The inequality (15) is still valid, but instead of (16) we now have 


Be(y) = E 


Wi 


-fifi- 


(17) 


i =1 y[ (fly) 2 + e 

hbin <- binpom (f, g, aeps = le-6, eps = le-10, itmax = 500, verbose = FALSE) 

## eps: 1.0e-06 itel: 111 phi_e: 0.984996 phi: 0.984999 x: -4.960047 0.244466 -0.077994 


6 Paired Comparisons 

In a typical paired comparison experiment each of m subjects generates a matrix E/ ; , with 
elements —1 and +1. If we have units weights, the vector pj~ has elements 

n 

Pik ) ) (Gijk @jik) ■ 

3 =1 

The results of paired comparison experiments are often presented in aggregated form, summed 
over subjects or occasions. An an example, consider the vegetable data of Guilford, from the 
psych package (Revelle (2018)). 


14 





library (psych) 
data (vegetables) 
veg 


## 

## Turn 
## Cab 
## Beet 
## Asp 
## Car 
## Spin 
## S.Beans 
## Peas 
## Corn 


Turn 


Cab 

Beet 


Asp 


Car 

Spin 1 

5.Beans 

Peas 

Corn 

0 

.500 

0 . 

.818 

0 

.770 

0 . 

,811 

0 , 

.878 

0 

.892 

0 , 

.899 

0 

.892 

0 

.926 

0 

. 182 

0 . 

.500 

0 

.601 

0 . 

,723 

0 , 

.743 

0 

.736 

0 . 

.811 

0 

.845 

0 

.858 

0 

.230 

0 . 

.399 

0 

.500 

0 . 

,561 

0 , 

.736 

0 

.676 

0 . 

.845 

0 

.797 

0 

.818 

0 

. 189 

0 . 

.277 

0 

.439 

0 . 

,500 

0 , 

.561 

0 

.588 

0 . 

.676 

0 

.601 

0 

.730 

0 

. 122 

0 . 

.257 

0 

.264 

0 . 

,439 

0 , 

.500 

0 

.493 

0 . 

.574 

0 

.709 

0 

.764 

0 

. 108 

0 . 

.264 

0 

.324 

0 . 

,412 

0 , 

.507 

0 

.500 

0 . 

.628 

0 

.682 

0 

.628 

0 

. 101 

0 . 

. 189 

0 

. 155 

0 . 

,324 

0 , 

.426 

0 

.372 

0. 

.500 

0 

.527 

0 

.642 

0 

. 108 

0 , 

. 155 

0 

.203 

0 . 

,399 

0 , 

.291 

0 

.318 

0. 

.473 

0 

.500 

0 

.628 

0 

.074 

0 . 

. 142 

0 

. 182 

0 . 

,270 

0 , 

.236 

0 

.372 

0 , 

.358 

0 

.372 

0 

.500 


In the aggregated table N cell n %3 indicates the number of individuals who prefer i to j, 
i.e. the number or proportion of individuals for which = +1. Thus 


m 

'y ~ @ijk f^ij 
k =1 


and 

m n 

^ Pik = 2 — Tiji). 

k =1 j =1 

s <- as.matrix (veg) 
s <- s - t(s) 

hpc<-pcpom(s , aeps = le-6, eps = le-10, itmax = 1000, verbose = FALSE) 

## eps: 1.0e-06 itel: 78 phi_e: 0.721500 phi: 0.721500 x: 1.000000 -0.125000 - 

In this case we find a scale with a single positive element, corresponding to the largest element 
of p , while the other n — 1 elements are all negative and equal. For such solutions the fit is 
max(p)/(u — 1), or in our case 0.7215. Clearly such a solution is disappointing, because we 
do not need elaborate majorization iterations to invariably come up with the largest element 
in a vector. 


7 Multidimensional Scaling 

Multidimensional scaling of squared distances means fi(x ) = x'AiX. Thus globally maximizing 
c f> means maximizing a non-convex quadratic form over a region defined by n\ non-convex 
quadratic inequality constraints. This seems a pretty hopeless task. A mental picture of the 
problem also suggests how serious the local maximum problem is bound to be. 

Scaling of distances has fi(x) = \Jx'AiX, which is even more complicated than the squared 
distance case. So the best we can do is start an infinite iterative procedure with an initial 


0.125000 



solution that is as close as possible, using some form of classical scaling, and then improve 
the solution with gradient or majorization steps until we find a local maximum. Again, this 
is a topic for further research. 

8 History 

The history of positive orthant type loss functions is actually more complicated than our 
previous discussion suggests. I review that history here, which will give me the opportunity 
to reconstruct my thoughts on the matter of fifty years ago 

The idea of measuring fit by looking at differences between model values was inspired by 
early work of Kendall (1938) and Daniels (1944). Just to refresh your memory: suppose x 
and y are two vectors of length n and we want to measure some form of correlation between 
the two. Here is a nice classical quotation. 

“Let us assign to each pair ( Xi,Xj ) what for convenience will be called a score a^- and to each 
{jJiiVj a score b l3 , where 

dij Up, bij bjj . 

Denote by I the number 

-p X) T't bjj 

the summation extending over all i and j from 1 to n .” (Daniels, 1944, p. 129). 

Daniels goes on to point out that a l3 = Xi — Xj gives the Pearson correlation, a i3 = sign(;rj — xf) 
gives Kendall’s tau, and a t] = rank(rc)i — (rank(a;)j gives Spearman’s rho. 

I am quite sure that this idea inspired my early work. In De Leeuw (1968a) fit measures for 
a model F = {/;}$ are introduced in the form 

n n 

£5>v(/i(*) - fj(x)) 
i =1 j =1 

where the anti-symmetric weights a t j are derived from the data ( and are either Kendall, 
Spearman, or Pearson weights. 

There was another very important influence, which we will discuss next. Guttman’s work on 
scale construction, while he was working for the army during WW II, still looms over much of 
psychometrics. It was elaborated on and tinkered with, but it was never improved, because it 
was perfect. For our purposes the key publication is Guttman (1946), which curiously enough 
appeared the Annals of Mathematical Statistics. 

“The new criterion adopted is that the numerical values be determined so as best to distinguish 
between those things judged higher and thoise judged lower for each individual .” (Guttman 
(1946), p 144) 

Guttman’s notation for paired comparisons is somewhat different from ours. He codes paired 
comparisons using zero and one, instead of plus and minus one. Thus for subject or replication 
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k define 


e ijk ~ 2 { a ijk + 1)- 

Now suppose x is the scale we are looking for, in deviations of the mean. The average value 
of objects that subject k judges higher or larger or better than something else is 


tk 


gjU gjU e ljkXj 

E n v^n ’ 

j =1 1 tijk 


and the average of objects judged smaller is 

S? =1 D"=i W 

E n 

7=1 fcjifc 


Mfc = 


We now want to maximize 


^ ) (tk ^k ) 


fc=l 


over x normalized by a;'a; = f. In a complete paired comparison experiment in which 
e^k + £jik = 1 for all i ^ j we have tk = — Uk- We also have 


1 ™ sr^n __ o / 

2 ^i=l *0 ^i=l _ 2 ipi 


tk — 


|n(n — 1) n(n — 1) ’ 

where the p & are the ranks corresponding to the Thus we can maximize 


J2 f2 k = x ' j E PkPk 


x 


k =1 


< /c=l 


over x'x = 1, which mean we find the largest eigenvalue and corresponding eigenvector of the 
cross-product matrix of the centered rank numbers. This means that Guttman’s signature 
based method is equivalent to the method of Slater (1960) or Carroll and Chang (1964), 
which would later become to be identified with the MDPREF program (Carroll and Chang 
(1968)). 

We start with De Leeuw (1968b) on nonmetric discriminant analysis. This proposes the loss 
function 

, ,a e;u(K,m,) 2 

where ti = (J t f-x. The partial derivatives are given, and a gradient method is suggested. 

The actual algorithm in the report, implemented in PL/I for the IBM 360/50, minimizes 

k(x, y ) =(Fx - y)\Fx - y) 

over x and y > 0 by alternating least squares (in fact, this is possibly the first use of that 
term). Of course 

1 n 

min K(x,y) = t^(|L:| ~U) 2 , 


y> o 


i= 1 
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which is the numerator of r. Although this is not mentioned in the report, this result also 
shows that r is a differentiable function of /, despite the absolute value signs. After each 
update of x it is normalized by x'F'Fx = 1. The ALS process converges to a local minimum 
of both n and r. 

A note in the back of the report says: “Although there is no program for NDA in the 
G-L-series as yet, professor Louis Guttman informs me that he would favor the absolute value 
principle. From the (up to now) rather sketchy accounts of this principle I gather it is quite 
similar to my positive orthant method.” 

De Leeuw (1968c) defines the positive orthant method, in the context of nonmetric multi¬ 
dimensional scaling, as the minimization of one of a family of loss functions . As in (1) we 
define 

tij(x) = <Tij{fi{x) - fj(x )), 

and then, for some q > 0, 


T « ( * )= *31, 


v \ x ) I 

O rij 1 


Uj(x)) q 

x)\ q 


(18) 


The matrix W = {tty,-} has non-negative weights. We assume, without loss of generality, that 
Wij\(Jij\ = Wij , i.e. if a.ij = 0 then also w iq = 0. Clearly 0 < r q (x) < 1 with r q (x) = 0 if and 
only if tij(x) > 0 for all i and j. 

Using Wij\<Jij\ = Wij and sign(oy,) = we can show that r q (x) = |(1 — <f> q (x)), where 


x = 


Z %=1 E "=1 WjjOjjifiix) - fj(x))\fi(x ) - fj(x)\ 
, 'I" , irr, /;(./•) ./)(•/•) '' 


< 7-1 


Thus minimizing r q is equivalent to maximizing <p q . See also Young (1975) or Hartmann 
(1979) for additional accounts of this method, and some historical context. 

De Leeuw (1968c) and De Leeuw (1970) discussed the general case q > 0 and the limiting 
cases with q —> 0 and q —> +oo, but the report is computationally rather naieve, because it 
just suggest to apply the gradient method and it ignores the fact that the absolute value 
function is not differentiable. The report discusses a 

From a computational point of view the most interesting members of the family (18) are 
T\ and r 2 . The case q = 2 has been discussed in Guttman (1969), as the “absolute value 
principle”, and in Johnson (1973) as the “pairwise method”, again just suggesting to use 
gradient methods. 


gg=igj=i Wijaijsignifijx) - fj(x))(fj(x) - fj(x)) 2 

E?=i ZU w M(x) - f q (x)y 

ri^EUw^iMx) - f^x)) 

[X> - fj{x)\ 
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The positive orthant method is introduced in De Leeuw (1968c). Chapter 3 in that report 
introduces the loss function 

, ,a E?=i Widul-uy 
Tq[X) 

The PL/I program NMSPOM for minimizing r q in the case of multidimensional scaling, using 
various gradient methods, is decribed, and some examples are given. 

The next report in the sequence is De Leeuw (1969). In our notation it suggests maximizing 

SUE UoM'i* ~ fr) 

s/lZMl *) 2 


9 Code 

9.1 linpom.R 

linpom <- function(f, 

s, 

w = array (1, dim(s)), 

xold = NULL, 

aeps = le-06, 

itmax = 100, 

eps = le-06, 

verbose = TRUE) 

{ 

w[s == 0] <- 0 

r <- rowSums(w * s - t(w * s)) 
wsum <- sum(w) 
v <- -(w + t(w)) 
diag(v) <- -rowSums(v) 
g <- crossprod(f, v 7*7 f) 

u <- colSums(r * f) 

if (is.null (xold)) { 
xold <- solve (g, u) 

> 

yold <- drop(f 7*7 xold) 

nold <- sqrt ( (outer (yold, yold, "-") 2) + aeps) 

pold <- sum(xold * u) / sum(w * nold) 
itel <- 1 
repeat { 

v <- w / nold 
v <- -(v + t(v)) 
diag(v) <- -rowSums(v) 
h <- crossprod(f, v 7*°/ f) 
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m <- sum (xold * (h 7»*7o xold)) 
xnew <- solve (h, u) 

lbd <- sqrt ((m + (2 * aeps * wsum))/ sum (u * xnew)) 

xnew <- lbd * xnew 

ynew <- drop(f 7*7, xnew) 

nnew <- sqrt ( (outer (ynew, ynew, "-") 2) + aeps) 

pnew <- sum (xnew * u) / sum(w * nnew) 
if (verbose) 
cat ( 

"Iteration: ", 

f ormatC(itel , width = 3, format = "d"), 

"pold: ", 
formate ( 
pold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"pnew: ", 
formate ( 
pnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((pnew - pold) < eps) I I (itel == itmax)) 
break 

nold <- nnew 
pold <- pnew 
itel <- itel + 1 

} 

fend <- sum (u * xnew) / sum (w * abs (outer (ynew, ynew, "-"))) 
return(list (x = xnew, p = pnew, f = fend, itel = itel)) 


9.2 binpom.R 

binpom <- function(f, 

s, 

w = rep (1, length(s)), 
xold = NULL, 
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{ 


aeps = le-06, 
itmax = 100, 
eps = le-06, 
verbose = TRUE) 


f <- cbind(l, f) 
r <- crossprod (f, w * s) 
wsum <- sum(w) 
g <- crossprod(f, w * f) 
if (is .null(xold)) { 
xold <- solve (g, r) 

> 

yold <- drop(f 7»*7o xold) 

nold <- sqrt(yold 2 + aeps) 

pold <- sum(xold * r) / sum(w * nold) 

itel <- 1 

repeat { 

v <- w / nold 
h <- crossprod(f, v * f) 
m <- sum (xold * (h 7o*7o xold)) 
xnew <- solve (h, r) 

lbd <- sqrt ((m + (2 * aeps * wsum)) / sum (r 
xnew <- lbd * xnew 
ynew <- drop(f 7 0 *7„ xnew) 
nnew <- sqrt (ynew 2 + aeps) 
pnew <- sum (xnew * r) / sum(w * nnew) 
if (verbose) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d"), 
"pold: ", 
formate ( 
pold, 

digits = 8, 
width = 12, 
format = "f" 


), 

"pnew: ", 
formate ( 
pnew, 

digits = 8, 
width = 12, 
format = "f" 

), 


* xnew)) 
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\n 


) 

if (((pnew - pold) < eps) I I (itel == itmax)) 
break 

nold <- nnew 
pold <- pnew 
itel <- itel + 1 

} 

fend <- sum (r * xnew) / sum (w * abs (ynew)) 
return(list ( 
x = xnew, 
p = pnew, 
f = fend, 
itel = itel 

)) 

} 


9.3 pcpom.R 

pcpom <- function(s, 

w = array (1, dim(s)), 

xold = NULL, 

aeps = le-06, 

itmax = 100, 

eps = le-06, 

verbose = TRUE) 

{ 

w[s == 0] <- 0 

r <- rowSums(w * s - t(w * s)) 
wsum <- sum(w) 
r <- r - mean (r) 
n <- length(r) 
wsum <- sum(w) 
if (is.null (xold)) { 
xold <- r 

} 

nold <- sqrt ( (outer (xold, xold, "-") 2) + aeps) 

pold <- sum(xold * r) / sum(w * nold) 
itel <- 1 
repeat { 

v <- w / nold 
v <- -(v + t(v)) 
diag(v) <- -rowSums(v) 
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m <- sum (xold * (v °/ 0 *°/ 0 xold)) 
xnew <- solve (v + 1 / n, r) 

lbd <- sqrt ((m + (2 * aeps * wsum)) / sum (r * xnew)) 
xnew <- lbd * xnew 

nnew <- sqrt ( (outer (xnew, xnew, "-") 2) + aeps) 

pnew <- sum (xnew * r) / sum(w * nnew) 
if (verbose) 
cat ( 

"Iteration: ", 

formatC(itel , width = 3, format = "d"), 

"pold: ", 
formate ( 
pold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"pnew: ", 
formate ( 
pnew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if (((pnew - pold) < eps) I| (itel == itmax)) 
break 

nold <- nnew 
pold <- pnew 
itel <- itel + 1 

} 

fend <- sum (r * xnew) / sum (w * abs (outer (xnew, xnew, "-"))) 
return(list ( 
x = xnew, 
p = pnew, 
f = fend, 
itel = itel 

)) 

} 
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